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A New Low-Rank Tensor Model for Video 

Completion 

Wenrui Hu, Dacheng Tao, Wensheng Zhang, Yuan Xie, and Yehui Yang 


Abstract —In this paper, we propose a new low-rank tensor model based on the circulant algebra, namely, twist tensor nuclear norm or 
t-TNN for short. The twist tensor denotes a 3-way tensor representation to laterally store 2D data slices in order. On one hand, t-TNN 
convexly relaxes the tensor multi-rank of the twist tensor in the Fourier domain, which allows an efficient computation using FFT. On the 
other, t-TNN is equal to the nuclear norm of block circulant matricization of the twist tensor in the original domain, which extends the 
traditional matrix nuclear norm in a block circulant way. We test the t-TNN model on a video completion application that aims to fill 
missing values and the experiment results validate its effectiveness, especially when dealing with video recorded by a non-stationary 
panning camera. The block circulant matricization of the twist tensor can be transformed into a circulant block representation with 
nuclear norm invariance. This representation, after transformation, exploits the horizontal translation relationship between the frames in 
a video, and endows the t-TNN model with a more powerful ability to reconstruct panning videos than the existing state-of-the-art 
low-rank models. 

Index Terms —Low-rank tensor estimation, tensor multi-rank, tensor nuclear norm, twist tensor, video completion. 
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1 Introduction 

L ow-rank tensor estimation (LRTE), which reveals the 
algebraic structure of multi-dimensional data (also re¬ 
ferred to as tensors), is a rapidly growing feature of re¬ 
search in many areas, such as computer vision |lj, signal 
processing |2J, data mining ||3l and machine learning [H. 
At the core of LRTE lies low-rank tensor decomposition ||5l, 
and for tensors of order higher than 2, two decompositions 
are commonly used, i.e., CANDECOMP/PARAEAC(CP) HI 
and Tucker decomposition jTj. 

The CP model factorizes a tensor into a sum of rank-1 
tensors, but it suffers from known computational and ill- 
posedness issues [8|. The Tucker model, as an economical 
surrogate, extends the notion of matrix rank to rank-A" for 
an A-dimension tensor m, and then forces the unfold¬ 
ing matrices of the tensor along each mode (i.e., single 
dimension) to be low-rank using the matrix SVD-based 
factorization method. It is usually necessary to specify the 
rank of each unfolding matrix as a prior in the utilization of 
Tucker decomposition, which tends to over/under-estimate 
the truth ITOl . Moreover, Tucker decomposition suffers from 
local minima |5| as a result of non-convex optimization. 

To overcome the above-mentioned drawbacks of the 
Tucker model, a convex relaxation technique, namely the 
sum of nuclear norms (SNN) of unfolding matrices, is 
provided by some authors fTTI , flSl . 1131 . SNN penalizes all 
unfolding matrices with the nuclear norm and serves as a 
tractable measure of rank-A in practical settings. All modes 
being simultaneously low-rank might be strong assumption 
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for a tensor, however. In consideration of this, a latent 
nuclear norm model (LNN) is introduced in M- LNN only 
requires that the tensor be the sum of a set of component 
tensors, each of which is low-rank in the corresponding 
mode, and this strategy enables LNN to automatically detect 
the rank-deficient modes. 

Note that neither SNN nor LNN exploits the correlations 
between modes. Purthermore, they try to model the tensor 
in the matrix SVD-based vector space, which results in loss 
of optimality in the representation. With these motivations, 
a tensor nuclear norm (TNN) is introduced into the LRTE 
problem for various tasks |T3, [161, |[T7|. The TNN model 
is based upon a new tensor decomposition scheme in ITSl , 
[191, EOl which the authors refer to as tensor-SVD or t-SVD 
for short. t-SVD has a similar structure to the matrix SVD 
and models a tensor in the matrix space through a defined 
t-product operation [20l . By transforming into the nuclear 
norm of block circulant representation, TNN simultaneously 
characterizes the low rank of a tensor along various modes. 

In this paper, we propose a new low-rank tensor model, 
twist tensor nuclear norm (t-TNN), for 3D video completion, 
in which the video sequence recorded by a stationary or 
non-stationary camera is considered as a low-rank tensor 
(exactly or approximately). In the t-TNN model, we design 
a 3-way tensor representation named twist tensor which 
laterally stores 2D data slices in order; the twist tensor is 
then used to exploit the low-rank structure of data based 
on the t-SVD framework. By equivalizing the nuclear norm 
of the block circulant matricization of the twist tensor, t- 
TNN bridges the t-SVD based tensor nuclear norm and the 
traditional matrix nuclear norm (MNN) m This bridging 
enables t-TNN not only to exploit the correlations between 
all the modes simultaneously but also to take advantage of 
the low-rank prior along a certain mode which is rooted 
in some types of tensor data, e.g., video sequence over the 
time dimension, hyperspectral images via the wavelength 





2 


variable, and face image samples through the number index. 

In the video completion application, the t-TNN model 
is verified as being more effective in reconstructing texture 
and fine detail than the existing state-of-the-art low-rank 
models (including the generalized TNN (GTNN) in tlSl , 
mi/ CZl)/ especially when dealing with video recorded by 
a non-stationary panning camera. We interpret this phe¬ 
nomenon by transforming the block circulant matricization 
of the twist tensor into a circulant block representation 
with nuclear norm invariance. This representation exploits 
the horizontal translation relationship between frames in a 
video, which gives the t-TNN model a suitable low-rank 
description for panning videos without translation compen¬ 
sation. 

The rest of this paper is organized as follows. Section 
|2] introduces related works and highlights the challenges 
of video completion. Section |3] gives the preliminaries on 
tensors and the notations that will be used throughout the 
paper. Section |4] describes the proposed model and algo¬ 
rithm for the LRTE problem in detail. Experimental analysis 
and completion results are given in Section |5] to verify our 
method. Lastly, Section gives concluding remarks and 
future directions. 

2 Related Works of Video Completion 

Video completion or inpainting is a computer vision tech¬ 
nique to fill the missing values of video sequences in a 
seamless manner. It is critical for applications in video 
repairing, video editing, and movie postproduction, to name 
but a few ||23l. The missing values can be caused by many 
circumstances EH, e.g., natural noise in the video capture 
equipment, errors in data conversion/communication, oc¬ 
clusion by obstacles in the environment, and the segmenting 
or removal of interesting objects from videos. Eig. [T] illus¬ 
trates the degradation of a video by insufficient sampling 
with a rate 0.3 (see the middle panel) and random occlusion 
with a 25 X 25 black block (see the bottom panel). 
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Figure 1. A panning video sequence existing multiple moving objects. 
Missing pixel (middle panel) imputation and occlusion (bottom panel) 
correction can be formalized as the video completion problem. An im¬ 
portant characteristic of video data is the temporal redundancy, and the 
effective utilization of this prior information can improve the performance 
of video completion. 

As the space-time equivalent of image completion 1^ , 
video completion inherits and extends the solutions of the 
original 2D problem. Eor example, l2^ utilizes the spatial 
partial differential equations (PDEs) based image inpainting 
method EZI to complete the video frame-by-frame, and in 
l28l , Jia et. al. extend the image repairing method |23 to the 
video case. Video completion also imposes a number of chal¬ 
lenges beyond completion of the image, however - mainly 
temporal coherency and spatial complexity. As a result, the 


method in Ei often causes more abrupt on temporal edges 
than spatial edges, while 1^ involves a gamut of different 
techniques that make the process of video completion very 
complicated 1301 . Considering that temporal information 
can significantly improve video completion, some works, 
e.g., 1311 and 1321 , exploit the temporal redundancy at the 
cost of high computational burden. Most recently, low- 
rank based methods ca, ca, Ea have achieved good 
estimations on missing values by counting the global space- 
time information 1^ . 

In this paper, we hold that the effectiveness of video 
completion largely depends on the effective utilization of 
the temporal redundancy between frames and the spatial 
relationships between entries. By exploiting the low-rank 
property of the video data in the twist tensor representation, 
our method is able to deal with a variety of challenging 
situations that arise in video completion, such as the correct 
reconstruction of dynamic textures, multiple moving objects 
and moving background [64l . Here, we assume that the 
motion of the background is caused by a camera with pan 
motions parallel to the scene, which occurs in a wide range 
of real circumstances 1301 . 


3 Notations and Preliminaries 

In this section, we introduce the notations and give the basic 
definitions used in the rest of the paper. We use calligraphy 
letters for tensors, e.g., A', upper case letters for matrices, 
e.g., X, bold lower case letters for vectors, e.g., x, and lower 
case letters for the entries, e.g., Xij. The Erobenius norm 
of a matrix X is defined as ||-V||i? := \xij\'^)^. Let 

X = UT^V^ be the SVD of X and (Ji{X) the Ah largest 
singular value, then the MNN of X is ||X||* := ^iCFi{X). 
The corresponding singular-value thresholding (SVT) op¬ 
eration with threshold T is D^(X) = UT.rV'^, where 
Sr = diag{(c7,(X) — t)^^} and t+ is the positive part of 
t. 

An X-way (or X-mode) tensor is a multi-linear structure 
in ^^^2 X... Xniv ^ tensor is a 2D section defined 

by fixing all but two indices, and a fiber is a ID section de¬ 
fined by fixing all indices but one ISf. Eor a 3-way tensor X, 
we will use the Matlab notation X(/c,:), X(:, /c,:) and X(: 
,/c) to denote respectively the kih horizontal, lateral and 
frontal slices, X(:, i, j), X(i,j) and X{i^j^\) to denote the 
mode-1, mode-2 and mode-3 fibers, and Xf = fft(X, [ ], 3) 
to denote the Eourier transform along the third dimension. 
In particular, X^^^ is used to represent X{\^k). The mode-/ 
unfolding Xq^^ G is a matrix whose columns 

are mode-/ fibers |5|. The opposite operation "fold" of the 
unfolding is defined as fold/ () = X. The Erobenius norm 
of X is ||X||i7 := ^rid the h norm of X is 

11^111 ^i,j,k \^ijk\- 

To construction our tensor nuclear norm based on t-SVD, 
it is necessary to introduce five block-based operators, i.e., 
bcirc, bvec, bvfold, bdiag and bdfold |20|, in advance. Eor 
X G J?^ix^2xn3 specially, the s can be used to form the 
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block circulant matrix 


■ A’(l) 


■■ A’(2)1 



■■ ^-(3) 

_3;'(n3) 

3;’(«3-l) . 

■ ■ 


bcirc(A') := 


the block vectorizing and its opposite operation 
•^(1) 


bvec(A:’) := 


^(na) 


and the block diag matrix and its opposite operation 


bdiag(A’) ;= 


;p("3) 


( 1 ) 


bvfold(bvec(d:’)) = X, (2) 


, bdfold(bdiag(d:’)) = df. 


(3) 


The t-product between two 3-way tensors can then be de¬ 
fined as follows EDI: 

Definition 1 (t-product). Let A' be ni x 77.2 x 77.3 and y be 
77-2 X 77,4 X 77 . 3 . The t-product A' * is an 77,1 x 77,4 x 77,3 tensor 


M. = X ='. bvfold{bcirc(T')bvec(J’)}. 


(4) 


The t-product is analogous to the matrix multiplication 
except that the circular convolution replaces the multiplica¬ 
tion operation between the elements, which are now mode-3 
fibers ca, as follows : 

n 2 

k=l 

where o denotes the circular convolution between two 
tubes. The t-product in the original domain corresponds to 
the matrix multiplication of the frontal slices in the Fourier 
domain, as follows : 

Mf =xfyf\k = l,...,ni. ( 6 ) 

Next we define the related notions of the tensor trans¬ 
pose, identity tensor, orthogonal tensor and f-diagonal ten¬ 
sor | 201 . 

Definition 2 (Tensor Transpose). Let T' G ><^ 2 x 713 ^he 

transpose tensor is an 77,2 x tt-i x 77,3 tensor obtained by 
transposing each frontal slice of X and then reversing the 
order of the transposed frontal slices 2 through 77 . 3 . 

Definition 3 (Identity Tensor). The identity tensor X G 
spni xni xns ^ tensor whose first frontal slice is the tt-i x tt-i 
identity matrix and all other frontal slices are zero. 

Definition 4 (Orthogonal Tensor). A tensor Q G 
is orthogonal if 

= = x, (7) 


where * is the t-product. 

Definition 5 (f-diagonal Tensor). A tensor is called f- 
diagonal if each of its frontal slices is diagonal matrix. The 
t-production of two f-diagonal tensors with the same size 


77-1 X 77-2 X 77 - 3 , i.e., Af = A' * 3^, is also an tt-i x 77,2 x 77,3 
f-diagonal tensor, and its diagonal tube fibers are 

:) = X{i,i, :)oy{i,i,:), i = , min(ni, n 2 ). (8) 


4 Method 

In this section we describe the proposed method in detail. 
In Section l4dl we introduce t-SVD and the corresponding 
tensor multi-rank, then GTNN is provided for the purpose 
of convex relaxation. In Section 14.21 a twist tensor is first 
defined, which leads to our new low-rank tensor model t- 
TNN. We also investigate the relationship between t-TNN 
and the traditional MNN of mode-3 unfolding and compare 
t-TNN with GTNN by transforming t-TNN into the circu¬ 
lant block representation. Section |43I presents the optimiza¬ 
tion algorithm for the t-TNN based tensor completion. 

4.1 Generalized tensor nuclear norm (GTNN) 

For a 3-way tensor X G xn 2 xn 3 ^ t-SVD of X is given 
by 

X = U*S*V'^, (9) 

where U and V are orthogonal tensors of size tt-i x tt-i x 77.3 
and 77.2 X 77.2 X 77,3 respectively. S is an f-diagonal tensor 
of size 77-1 X 77-2 X 77 . 3 , and * denotes the t-product. Fig. 
illustrates the decomposition. As demonstrated in Eq. 
the t-production can be computed efficiently in the Fourier 
domain, which leads Alg.[T]to obtain the t-SVD f20|. 



Algorithm 1: t-SVD 
Input: A G }?^ixn2xn3^ 

Output: U, S, V. 

1 A/ = fft(A, [],3). 

2 for /c = 1 : 77.3 do 

3 [U,'E,V] = SYI){xj'‘^). 

4 ^ = U, S‘f'> = S, vf ^ = V. 

5 end 

6 U = ifft(Z///, [ ], 3), 5 = ifft(5/, [ ], 3), V = 
ifrt(V/, [],3). 

7 Return U, S, V; 


Resorting to t-SVD, we can define the tensor multi-rank 
as follows KTbl , 113, 1^ : 

Definition 6 (Tensor multi-rank). The multi-rank of A G 
spnixn2xn3 ^g ^ vector r G with the ith element equal 

to the rank of the ith frontal slice of A/. 

Then the GTNN is given as 

n3 min(ni,n2) 

E \Sf{hi,k)\, (10) 

/c=l i=l 
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which is proven to be a valid norm and the tightest convex 
relaxation to norm of the tensor multi-rank in flbl , QJl* 
From Alg. [TJ it can be seen that 

bdiag(A'/) = bdiag(Z///)bdiag(5/)bdiag(V/)^. (11) 

Due to the unitary invariance of MNN, we have 

||bdiag(A»||* = ||bdiag(<S/)||* = ||d:’||@, (12) 

and since block circulant matrixes can be block diagonalized 
by using the Fourier transform, there is 


||bdiag(d(f)||* 


Finally, we obtain 


||(F,3^4jbcirc(A')«3^4j||* 

||bcirc(A')||*. 

(13) 


Ill'll® = ||bcirc(A')||*. (14) 

The equivalence in Eq. dm endows the GTNN with 
interpretability in the original domain. We can see that the 
block circulant representation of T' preserves the spacial 
relationship between entries, and ||bcirc(A')|measures 
the rank of bcirc(T') by comparing every row and every 
column of frontal slices over the third dimension (especially 
the time dimension for video data), which exploits the 
spatial-temporal information of a tensor deeper than the 
monotonous MNN of certain unfolding. 


4.2 Twist tensor nuclear norm (t-TNN) 

Before introducing our new tensor nuclear norm, we need 
to define the twist tensor as follows : 



m 


n 



twist 




m 





< - 





squeeze 




ni m 


Figure 3. The twist and squeeze operations. 



Figure 4. Mode-1, nnode-2 and nnode-3 unfoldings of a 3-way tensor A’ 
and the block circulant matricization of the twist tensor A. The unfolding 
operation corresponds to aligning the corresponding slices for each 
mode n^ext to each other. It can be seen that the first block column of 
bcirc(A), i.e., bvec(A), is equal to the mode-3 unfolding of A. 



permute ^ permute 



Figure 5. Circulant block representation of A and A by permuting 

bcirc(A) and bcirc(A). 


Definition 7 (Tensor Twist and Squeeze). Let A' G 

jjnixn2xn3^ then the twist tensor 3^ = T' is an ni x ns x n 2 
tensor whose lateral slice = twist(A’^^^). Corre¬ 

spondingly, the squeeze tensor of y, i.e., X = y, can be ob¬ 
tained by the reverse process, i.e., = squeeze(J’(:, /c, :))• 

Fig. |3] illustrates the twist and squeeze operations. 

The t-TNN based on the t-SVD framework is then 

ll-^llg:=ll‘?ll@ = l|bcirc(5)||*. (15) 

For a deeper insight into t-TNN, we mine the relation¬ 
ship between t-TNN and the traditional MNN, which is 
illustrated in Fig. 01 Consider a 3D video data X with size 
ni X n 2 X ns (length x width x frames). The mode-3 unfolding 
of X can be obtained by vectorizing each frame then align¬ 
ing them in time order. Due to the content continuity, i.e., 
the pixels at the same location in consecutive frames tend to 
change small, 11 T’(s) 11 * can effectively exploit the temporal 
redundancy between frames. However, the vectorization of 
frames ignores the spatial information of pixels within one 
frame. By extending T'^s) a block circulant way, bcirc(^) 
contains a latent spatial feature for each frame along the row 
direction. 


Next, we analyze the t-TNN model from the perspective 
of circulant block representation 1^ . For a tensor X with 
size m X n X k, the circulant block matricization of X is 
defined as follows : 


circ(T') : 


circ(T’ii:) 

circ(T’ 2 i:) 


circ(A'i2:) 

circ(A'22:) 


ciTc{Xin:) 

circ(T’2n:) 


ciTc{Xml:) ciTc{Xm2:) ciTc{Xmn:) 

(16) 

where Xij: = X{i^j^\) is a mode-3 fiber which belongs to 
a length-/^ vector space or Kk, and c\Yc{Xij:) constructs a 
K/c-circulant module or block with Xij.. By permutation, 
there exists a relationship between circ(A') and bcirc(A') as 
follows : 


circ(T') = Pibcirc(A')P 2 , (17) 


where Pi, i = 1,2, denote so-called stride permutations 1351 . 
Fig. 13 illustrates the transformation of both bcirc(T') and 
bcirc(A') with a 3 x 3 x 3 tensor as an example. Because of 
the permutation invariance, we have 


||circ(A')||* = ||bcirc(T')||*. (18) 
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Considering a 3D video data A', Xij. corresponds to a 
time tube at position (i, j), while Xik-. is the ith row of the 
kth frame. Compared to circ(A'), circ(A') is more suitable for 
describing a scene with a global panning motion, because 
I |circ(A')| 1^ measures the extent of the change in the rows 
across frames in a circulant way. In other words, when a 
video sequence is recorded by a camera in panning motion, 
there exists the horizontal translation for pixels over time, 
and circ(^) exploits this linear relationship between frames. 

4.3 Tensor completion from missing vaiues 

The task of tensor completion is to recover the latent tensor 
Ai from missing values, which can be addressed by solving 
the following convex optimization problem 

min\\X\\^, s.t. Vn{X)=Vn{M), (19) 

a: E / 

where Vq is the orthogonal projector onto the span of 
tensors vanishing outside of Q, namely, 

^Hjk — t 

and Vq± is the complementary projection, i.e., Vq{A!) + 

We adopt the alternating direction method of multipliers 
(ADMM) 1361 to solve problem ([191 . By introducing a new 
tensor variable y = X, we obtain the following objective 
function 

E{X, y, W) =113^11^ + lXn=Mn+ <w,x-y> 

+ ^\\x-y\\%, 

where 1 denotes the indicator function. According to the 
framework of ADMM, we can iteratively update X, y and 
W, as follows : 

A” = argmin \\X - {y - -W)\\%, (22) 

X-.Xn=Mn P 

y= argmin \\y\\g + ^\\y - (X + -)y)\\l, (23) 

y\; = y\; + p[X -y), (24) 

where Eq. ([22)) is a least-square projection onto the constraint 
and its solution is 

-l:” = Vn{M) + (3^ - -W). (25) 

P 

We can obtain the solution for Eq. ([23t through the 
following theorem 

Theorem 1. For r > 0 and y^Z^ jjni xn 2 xns^ twist tensor 

of the globally optimal solution to the following problem 

T\\y\\§ + \\\y- ^\\f ( 26 ) 

is given by the tensor singular value convoluting (tSVC) 

y = Cr{Z) =ll*Cr{S)*V'^, (27) 

where Z = U * S and Cr{S) = S * J, herein, J is an 
Til X Tis X 712 f-diagonal tensor whose diagonal element in the 
Fourier domain is Jf{iA^ j) = (1- (Z. .. )+• 


Algorithm 2: t-TNN based tensor completion 

Input: Observation data M, projector Pq 
Output: Completion tensor A'. 

1 Initialize: > 0,7^ > 1, /c = 0, 

2 x = Pn{M),y = w = y, 

3 while 11 A’ - 3 ^||f/||^||f > tol andk<K do 

4 

5 

6 
7 


10 

11 

12 

13 

14 

15 


X'‘+^ = Va{M'^) + (3^^= - ^W'^); 

t = 4 , Z = X’^+^ 


Zf = m{z,[], 3 y, 

for J = 1 : 772 do 

[Uf,sf,vf] 


jf = diag{(l - 


^(j) 

= SVD(Zp ); 






c*(i) _ c*(i) rjU) . 

~ ^f ' 

end 

H = ifrt(-H/,[], 3 ), 3 ^'=+i =n-, 

yp/c+1 ^ yp/c ^ p^(A'^+^ - y^^^y, 

pk+1 _ ^pk^ p _ p ^ 2 


16 end 

17 Return Tensor X ; 


The proof of Theorem [T] can be found in the Appendix, 
and the optimization procedure of our t-TNN based ten¬ 
sor completion is described in Alg. |2l The convergence of 
ADMM has been proved in |36|, and the computational 
bottleneck of Alg. [^ lies in computing the 3D EET and 3D 
inverse EET of an 771 x 773 x 772 tensor and 772 SVDs of 
771 X 773 matrices in the Eourier domain. There is no need 
to run over all 772 SVDs because of the conjugate symmetry 
in the Eourier domain. Eor example, if 772 is even, we can 
run the SVDs for j = 1 ,..., ^ + 1, and then populate the 
remaining ^ — 1 SVDs for j = 2 ,..., ^ as follows: 

uf'^ -i+2) = conj (wy), (28) 

=5y, (29) 

= conj(Vy). (30) 

Since with most situations in video data we have 
771,772 > 773 and log(772) < 77i,773, the Computation at 
each iteration will take 0(2771772773log (772) + 1772773) ^ 
0(77177277|) (without considering the parallel computing 
for each SVD) compared to 0(min{77^772,77i772)773) using 
GTNN and 0(771772773) using MNN for the mode-3 unfold¬ 
ing. 

5 Experiments 

In this section we compare our t-TNN model with five other 
tensor-based or matrix-based models for real video comple¬ 
tion: GTNN CZl, SNN Ca, LNN IH, TMac ||33l, and MNN 
ED. All the completion methods but TMac are implemented 
in the ADMM framework, and their parameter settings are 
empirically determined to give the best performance and 
are fixed in all tests. Eor TMac, the completion results are 
generated from the source codes released by their author^ 

1. http://www.math.ucla.edu/^wotaoyin/papers/tmac.html 
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Nine videos datasets recorded by stationary camera or non¬ 
stationary camera are used to verify the effectiveness of the 
proposed model. 
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(a) (b) (c) (d) (e) (f) (g) 


Figure 7. Magnification of image content within green rectangles in Fig. 
[6] with different methods, (a) Full, (b) t-TNN, (c) GTNN, (d) SNN, (e) 
LNN, (f) TMac, (g) MNN. 


5.1 Video completion from limited samples 

We first test our t-TNN model by completing the video 
from limited entries. The entries are sampled according to 
the Bernoulli model, which means that each entry in the 
video data is sampled with probability p independent of 
others. Fig. [6] shows the completion result of an example 
frame for each video when p = 0.3, and the image content 
within green rectangle is magnified in Fig. 0 For videos 
recorded by a non-stationary panning camera [Fig. [bfa) - 
Fig. Eg)], it can be clearly observed that t-TNN and GTNN, 
both of which exploit the spatial and temporal relationships 
between entries simultaneously on the t-SVD framework, 
perform better than other methods. For the stationary video. 
Led [Fig.l^h)], t-TNN and GTNN reconstruct the moving led 
signs more accurately, while for the stationary surveillance 
video. Escalator (Fig. [^i)), t-TNN and GTNN have relatively 
weaker advantages than other methods. 



Figure 8. RSE per frame (p = 0.3). For the ^th frame, RSE^ = 
201ogio( ). (a) Windmill, (b) Led, (c) Escalator. 


We can also see that t-TNN recovers more textures and 
finer detail than GTNN in Fig. [7| This point is further 
demonstrated by the RSE curves of frames plotted in Fig. 
m where three different types of videos. Windmill, Led, and 



(a) (b) (c) (d) 

Figure 9. Terminal tails (p = 0.3). (a) - (b) are terminal frames of GTNN 
completion result where the residual images of windmill appear, (c) - (d) 
are terminal frames of t-TNN completion result. 


Escalator are taken as examples. It is worth noting that 
serious "terminal tails" exist in the GTNN completion result 
for the panning video Windmill [see Fig. [8] (a)]. Fig. [9] shows 
the corresponding terminal frames in which the ghost of a 
windmill appear [see Fig. [9] (a) and (b)]. This phenomenon is 
caused by the translation of image content while the image 
circulant representation in the GTNN model cannot exploit 
the linear relationship between frames. In contrast, the t- 
TNN completion results effectively compress the terminal 
tails due to the row circulant representation over time. 

Table [T] summarizes the average inverse RSE and run¬ 
ning time for each video when p = 0.1, 0.2,..., 0.9. The 
highest average iRSE and lowest average time are high¬ 
lighted in bold. It can be observed that the proposed t- 
TNN model outperforms the other models, especially when 
dealing with panning videos, and a 0.8dB - 3.7dB decrease at 
RSE is achieved over the GTNN model. With respect to time 
consumption, t-TNN is significantly superior over GTNN 
for videos with large images. 


TABLE 1 

Summary of Average inverse RSE (-dB) and running time (Sec) 


iRSE^ ^ -2Qlog,o( ll^^-^;j^ )forp = 0.1to0.9 


Videos 

t-TNN 

GTNN 

SNN 

LNN 

TMac 

MNN 

Build. 

23.2/ 66 

22.5 / 419 

21.2 / 153 

12.8 / 237 

18.1 / 270 

11.3 / 140 

W.mill 

38.9 / 120 

35.6 / 481 

32.1 / 102 

27.9 / 115 

28.3 / 240 

25.5/ 70 

Trees 

32.5 / 100 

28.8 / 388 

27.1 / 98 

21.3 / 207 

24.5 / 276 

20.7 / 107 

People 

23.1/ 42 

19.9 / 145 

19.2 / 52 

16.7/ 76 

17.6 / 128 

16.4/ 33 

Bike 

28.4 / 141 

25.2 / 408 

22.4 / 183 

16.8 / 264 

18.3 / 358 

16.2 / 156 

Ship 

34.8 / 176 

31.7 / 653 

31.4 / 133 

25.6 / 175 

29.6 / 303 

24.0 / 112 

B.ball 

24.5/ 30 

22.6/ 58 

18.7/ 25 

16.9 / 34 

17.2/ 63 

16.1 / 17 

Led 

29.7 / 142 

27.3 / 306 

25.0 / 129 

21.4 / 158 

23.7 / 245 

19.6 / 102 

Escal. 

23.8/ 39 

23.8 / 58 

19.9/ 11 

23.0 / 24 

18.4/ 53 

22.2/ 9 


5.2 Video completion from random occlusion 

We also test the proposed t-TNN model by completing the 
video from the random occlusion. An image patch 0.3 times 
the image size is cut out from a random position for each 
frame. Eig. [10| shows the part-completion results of t-TNN 
and GTNN, and Eig. [TT] plots the inverse RSE for each 
video with different methods. Similar experiment results 
to those in Section I5.1I can be observed, in which t-TNN 
outperforms the other methods when dealing with panning 
videos. More experiment results and analyses are provided 
in the Appendix. 
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Figure 6. Completion results from limited samples (p = 0.3). (a) Building (419 x 523 x 30), (b) Windmill (304 x 480 x 40), (c) Trees (311 x 571 x 37), 
(d) People (240 X 320 x 40), (e) Bike (288 x 512 x 60), (f) Ship (277 x 454 x 60), (g) Basketball (144 x 256 x 40), (h) Led (234 x 431 x 60), (i) 
Escalator (130 x 160 x 60). (a)-(g) are non-stationary panning videos, whereas (h)-(i) are stationary videos. 


6 Conclusion 

In this paper, a t-SVD based low-rank tensor model named t- 
TNN is proposed to complete a video from limited samples 
or random occlusion. The t-TNN relaxes the tensor multi¬ 
rank defined in the Fourier domain and is equal to the 
nuclear norm of the block circulant matricization of the twist 
tensor in the original domain. This two-fold action of t-TNN 
has two advantages, namely, efficient computation using 
FFT and interpretability for real problems. Furthermore, the 
block circulant matricization of the twist tensor can be trans¬ 
formed into a circulant block representation with invariance 
of nuclear norm, in which the entries of each circulant block 
correspond to mode-3 fibers of the twist tensor. In video 
completion, the circulant block representation compares 
rows of images along the time dimension in a circulant way, 
which exploits the temporal redundancy of videos recorded 
by a non-stationary panning camera and leads to superior 
completion performance over existing state-of-the-art low- 
rank models. It anticipated that the t-TNN model can be 
used in a wide range of applications in video processing, 
e.g., background modeling and video denoising. 
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